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Abstract. In this work a stand-alone preprocessor for SAT is presented that is able to 
perform most of the known preprocessing techniques. Preprocessing a formula in SAT is 
important for performance since redundancy can be removed. The preprocessor is part of the 
SAT solver riss [9] and is called Coprocessor. Not only riss, but also MiniSat 2.2 [11] benefit 
from it, because the SatELite preprocessor of MiniSat does not implement recent techniques. 
By using more advanced techniques, Coprocessor is able to reduce the redundancy in a 
formula further and improves the overall solving performance. 



1 Introduction 

In theory SAT problems with n variables have a worst case execution time of 0(2") [2J. Reducing 
the number of variables results in a theoretically faster search. However, in practice the number 
of variables does not correlate with the runtime. The number of clauses highly influences the 
performance of unit propagation. Preprocessing helps to reduce the size of the formula by removing 
variables and clauses that are redundant. Due to limited space it is assumed that the reader is 
familiar with basic preprocessing techniques [3]. Preprocessing techniques can be classified into 
two categories: Techniques, which change a formula in a way that the satisfying assignment for the 
preprocessed formula is not necessarily a model for the original formula, are called satisfiability- 
preserving techniques. Thus, for these techniques undo information has to be stored. For the second 
category, this information is not required. The second category is called equivalence-preserving 
techniques, because the preprocessed and original formula are equivalent. 

This paper is structured in the following way. An overview of the implemented techniques 
is given in Sect. [2j Details on Coprocessor, a format for storing the undo information and a 
comparison to SatELite is given in Sect. [3j Finally, a conclusion is given in Sect. [4j 

2 Preprocessor Techniques 

The notation used to describe the preprocessor is the following: variables are numbers and literals 
are positive or negative variables, e.g. 2 and -i2. A clause C is a disjunction of a set of literals, 
denoted by [l±, . . . ,l n ]. A formula is a conjunction of clauses. The original formula will be referred 
to as F, the preprocessed formula is always called F' . Unit propagation on F is denoted by BCP(7), 
where I is the literal that is assigned to true. 

2.1 Satisfiability-Preserving Techniques 

The following techniques change F in a way, that models of F' are no model for F anymore. 
Therefore, these methods need to store undo information. Undoing of these methods has to be 
done carefully, because the order influences the resulting assignment. All the elimination steps 
have to be undone in the opposite order they have been applied before [5J. 



Variable Elimination (VE) |3ll3j is a technique to remove variables from the formula. Removing a 
variable is done by resolving the according clauses in which the variable occurs. Given two sets of 
clauses: C x with the positive variable x and Cx with negative x. Let G be the union of these two 
sets G = C x U Cx- Resolving these two sets on variable x results in a new set of clauses G' where 
tautologies are not included. It is shown in [3] that G can be replaced by G' without changing 
the satisfiability of the formula. If a model is needed for the original formula, then the partial 
model can be extended using the original clauses F to assign variable x. Usually, applying VE 
to a variable results in a larger number of clauses. In state-of-the-art preprocessors VE is only 
applied to a variable if the number of clauses does not increase. The resulting formula depends on 
the order of the eliminated variables. Pure literal elimination is a special case of VE, because the 
number of resolvents is zero. 

Blocked Clause Elimination (BCE) [7J removes redundant blocked clauses. A clause C is blocked if 
it contains a blocking literal I. A literal I is a blocking literal, if I is part of C, and for each clause 
C' € F with I £ C' the resolvent C <£>; C' is a tautology |4l7j . Removing a blocked clause from F 
changes the satisfying assignments [4]. Since BCE is confluent, the order of the removals does not 
change the result [7j. 

Equivalence Elimination (EE) [5] removes a literal I if it is equivalent to another literal V . Only 
one literal per equivalence class is kept. Equivalent literals can be found by finding strongly 
connected components in the binary implication graph (BIG). The BIG represents all implications 
in the formula by directed edges I — > I' between literals that occur in a clause [ I, I' ] . If a cycle 
a — > 6 — > c — > a is found, there is also a cycle a — > b — > c — > a and therefore a = b = c can be shown 
and applied to F by replacing 6, and c by a. Finally, double literal occurrences and tautologies are 
removed. 

Let F be ([l,-2]i, [-.1,2] 2 , [1,2,3] 3 , [-.1, -.3] 4 , h3,4] 5 , [-.l,-4] 6 ). The index i of a clause d gives 
the position of the clause in the formula. The order to apply techniques is EE, VE and finally BCE. 
EE will find 1 = 2 based on the clauses C\ and C<z- Thus, it replaces each occurrence of 2 with 1, 
since 1 is the smaller variable. This step alters C3 to C7 = [1,3]. Now VE on variable 3 detects 
that there are 3 clauses in which 3 occurs. The single resolvent that can be build is Cy<%$ = [1,4]. 
Finally, BCE removes the two clauses, because all literals of each clause are blocking literals. Since 
the resulting formula is empty, it is satisfied by any interpretation. It can be clearly seen, that the 
original formula cannot be satisfied by any interpretation. 

2.2 Equivalence-Preserving Techniques 

Equivalence-preserving techniques can be applied in any order, because the preprocessed formula is 
equivalent to the original one. By combining the following techniques with satisfiability-preserving 
techniques the order of the applied techniques has to be stored, to be able to undo all changes 
correctly. 

Hidden Tautology Elimination (HTE) [3] is based on the clause extension hidden literal addition 
(HLA). After the clause C is extended by HLA, C is removed if it is tautology. The HLA of 
a clause C with respect to a formula F is computed as follows: Let I be a literal of C and 
[/', I] € F \ {C}. If such a literal I' can be found, C is extended by C := C U /'. This extension is 
applied until fix point. HLA can be regarded as the opposite operation of self subsuming resolution. 
The algorithm is linear time in the number of variables [4] . An example for HTE is the formula F 
= ([1,3], [—2,3], [1,2]). Extending the clause C\ stepwise can look as follows: Cj = [1,3,^2] with 
C3. Next, C\ = [1, 3, -i2, 2] with C2, so that it becomes tautology and can be removed. 

Probing [5] is a technique to simplify the formula by propagating variables in both polarities 
I and I separately and comparing their implications or by propagating all literals of a clause 
C = [lx, . . . , l n ], because it is known that in the two cases one of the candidates has to be satisfied. 

Probing a single variable can find a conflict and thus finds a new unit. The following example 
illustrates the other cases: 



BCP(l) 2, 3, 4, -.5, -7 
BCP(T) =>■ 2, -4, 6, 7 

To create a complete assignment, variable 1 has to be assigned and both possible assignments 
imply 2, so that 2 can be set to true immediately. Furthermore, the equivalences 4=1 and 7=1 
can be found. These equivalences can also be eliminated. Probing all literals of a clause can find 
only new units. 

Vivification (also called Asymmetric Branching) [12 reduces the length of a clause by propagating 
the negated literals of a clause C = [li,...,l n ] iteratively until one of the following three cases 
occurs: 

1. BCP({7i, . . . , h}) results in an empty clause for i < n. 

2. BCP({Zi, . . . , ij}) implies another literal lj of the C with i < j < n 

3. BCP({Zi, . . . , li}) implies another negated literal L of the C with i < j < n 

In the first case, the unsatisfying partial assignment is disallowed by adding a clause C = 
The clause C subsumes C. The implication l\ A ••• A U — > lj in the second case 
results in the clause C = [li, ■ ■ ■ ,U, lj] that also subsumes C. Formulating the third case into 
a clause C — [h, . . . , k, lj] subsumes C by applying self subsumption to C" = C ®^ C = 

[ll j ■ ■ • j lj— 1 1 7 • • • ; ^n] • 

Extended Resolution (ER) 1 introduces a new variables v to a formula that is equivalent to a 
disjunction of literals v = I V I'. All clauses in F are updated by removing the pair and adding 
the new variable instead. It has been shown, that ER is good for shrinking the proof size for 
unsatisfiable formulas. Applying ER during search as in [T] resulted in a lower performance of riss, 
so that this technique has been put into the preprocessor and replaces the most frequent literal 
pairs. Still, no deep performance analysis has been done on this technique in the preprocessor, but 
it seems to boost the performance on unsatisfiable instances. 

3 Coprocessor 

The preprocessor of riss, Coprocessor, implements all the techniques presented in Sect. [2] and 
introduces many algorithm parameters. A description of these parameters can be found in the 
help of Coprocessor^ The techniques are executed in a loop on F, so that for example the result 
of HTE can be processed with VE and afterwards HTE tries to eliminate clauses again. 

It is possible to maintain a blacklist and a white-list of variables. Variables on the white-list 
are tabooed for any non- model-preserving techniques so that their semantic is the same in F' . 
Variables on the blacklist are always removed by VE. 

Furthermore, the resulting formula can be compressed. If variables are removed or are already 
assigned a value, the variables of the reduct of F' are usually not dense any more. Giving the 
reduct to another solver increases its memory usage unnecessarily. To overcome this weakness, a 
compressor has been built into Coprocessor that fills these gaps with variables that still occur in 
F' and stores the already assigned variables for postprocessing the model. The compression cannot 
be combined with the white-list. 

Another transformation that can be applied by the presented preprocessor is the conversion 
from encoded CSP domains from the direct encoding to the regular encoding as described in [TU] . 

3.1 The Map File Format 

A map file is used to store the preprocessing information that is necessary to postprocess a model 
of F' such that it becomes a model for F again. The map file and the model for F' can be used 
to restore the model for F by giving this information to Coprocessor. The following information 
has to be stored to be able to do so: 



1 The source code can be found at 



www.ki . inf . tu-dresden.de/-norbert 



Once 


Per elimination step 


Compression Table 


Variable Elimination 


Equivalence Classes 


Blocked Clause Elimination 




Equivalence Elimination Step 



1: original variables 
2:30867 

3: compress tables 
4: table 30867 
5:1 2 3 5 6 7 9 10 11 
6:units 
7: -31 32 
8: end table 
ee table 

1 -19 

2 -20 



30666 -30822 



The map file is divided into two parts. An partial example file for illustration is given in Fig. [Tj 
The format is described based on this example file. Each occurring case is also covered in the 
description. The first line has to state "original variables" (line 1). This number is specified in the 
next line (line 2). Next, the compression information is given by beginning with either "compress 
table" (line 3), if there is a table, or "no table", if there is no compression. Afterwards, the tables 
are given where each starts with a line "table k v" and k represents the number of the table and 
v is the number of variables before the applied compression (line 4). The next line gives the com- 
pression by simply giving a mapping that depends on the 
order: the i th number in the line is the variable that is repre- 
sented by variable i in the compressed formula (line 5). The 
line is closed by a 0, so that a standard clause parser can be 
used. The next line introduces the assignments in the original 
formula by saying "units k" (line 6). The following line lists 
all the literals that have been assigned true in the original for- 
mula and is also terminated by (line 7). The compression 
is completed with a line stating "end table" (line 8). At the 
moment, only a single compression is supported, and thus, 
k is always 0. Since there is only a single compression, it 
is applied after applying all other techniques and therefore 
the following details are given with respect to the decom- 
pressed preprocessed formula F' . The next static information 
is the literals of the EE classes. They are introduced by a line 
"ee table" (line 9). The following lines represent the classes 
where the first element is the representative of the class that 
is in F'(line 10-12). Each class is ordered ascending, so that 
the EE information can be stored as a tree and the first ele- 
ment is the smallest one. Again, each class is terminated by 
a 0. Finally, the postprocess stack is given and preluded with 
a line "postprocess stack" (line 13). Afterwards the elimina- 
tions of BCE and VE are stored in the order they have been 
performed. BCE is prefaced with a line "bee /" where I is the 
blocking literal (line 15,17). The next line gives the according 
blocked clause (line 16,18). For VE the first line is "ve v n" 
where v is the eliminated variable and n is the number of 
clauses that have been replaced (line 20,22). The following 
n lines give the according clauses (line 21,23-26). Finally, for 
EE it is only stated that EE has been applied by writing 
a line "ee", because postprocessing EE depends also on the 
variables that are present at the moment (line 14) . Some of the variables might already be removed 
at the point EE has been run, so that it is mandatory to store this information. 



9 
10 
11 
12: . . . 

13:postprocess stack 
14:ee 

15:bce 523 
16: -81 523 -6716 
17:bce 10623 
18: -10429 10623 
19: . . . 

ve 812 1 
-812 -74 
ve 6587 4 
6587 6615 
-79 6587 



30296 



20: 
21: 
22: 
23: 
24: 
25: 



Fig. 1: Example map file 



3.2 Preprocessor Comparison 

A comparison of the formula reductions of Coprocessor and the current standard preprocessor 
SatELite is given in Fig. [2] and has been performed on 1155 industrial and crafted instances from 
recent SAT Competitions and SAT Racea^ The relative reduction of the clauses by Coprocessor 
and SatELite is presented. Due to ER, Coprocessor can increase the number of clauses, whereby the 



2 For more details visit http://www.ki.inf.tu-dresden.de/-norbert/paperdata/WLP2011.html 
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Fig. 2: Relative reduction of SatELite and Coprocessor 



average length is still reduced. Coprocessor is also able to reduce the number of clauses more than 
SatELite. The instances are ordered by the reduction of SatELite so that the plot for Coprocessor 
produces peaks. 

Since SatELite [3] and MiniSAT [TT] have been developed by the same authors, the run times 
of MiniSAT with the two preprocessors are compared in Fig. [3j Comparing these run times of 
MiniSAT (MS) combined with the preprocessors, it can be clearly seen that by using a preprocessor 
the performance of the solver is much higher. Furthermore, the combination with Coprocessor 
(MS+Co) solves more instances than SatELite (MS+S) for most of the timeouts. 
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Fig. 3: Runtime comparison of MiniSAT combined with Coprocessor and SatELite 



4 Conclusion and Future Work 



This work introduces the SAT preprocessor Coprocessor that implements almost all known prepro- 
cessing techniques and some additional features. Experiments showed that the default Coprocessor 
performs better than SatELite when combined with MiniSAT 2.2. For suiting its techniques bet- 
ter to applications, Coprocessor provides many parameters that can be optimized for special use 
cases. Additionally, a map file format is presented that is used to store the preprocessing informa- 
tion. This file can be used to re-construct the model for the original formula if the model for the 
preprocessed formula is given. 

Future development of this preprocessor includes adding the latest techniques such as HLE 
and HLA [415] and to parallelize it to be able to use multi-core architectures. Furthermore, the 
execution order of the techniques will be relaxed, so that any order can be applied to the input 
formula. 
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